procedure Greedy (T(X), e, G, 0) 

Input: n-attribute table T and n-vector of error tolerances e 
Bayesian network G on the set of attributes X and 
threshold 9 on the relative benefit for selecting a 
CaRT predictor. 

Output; A set of materialized (predicted) attributes X^^^. [X^^ed 
= X - X^t) ^ CaRT predictor for each X^ E Xpj-ed- 

begin 

2. let < X;^, X2, . . > be the attributes in X sorted in 

topological order of G 
3 . for i :=1, . . . ,n 

4. if n (Xi) = + then X^^t — X^t {^±} /* must be 
materialized if it has no parents in G */ 

5. else 

6. M := BuildCaRT (X^^^ ej 

7. if (MaterCost (Xi) / PredCost (X^t - Xi) > 9) then Xp^^d - = 

Xp.edU {Xi} 

8. else X^t ^ {Xi} 

9. end 

10. end 
end 



FIGURE 2: The Greedy CaRT Selection Algorithm 



procedure MaxIndependentSet (T(X); e, G, neighborhood {) ) 



Input: n-attribute table T and n- vector of error tolerances e; 

Bayesian network G on the set of attributes X and function 

neighborhood!) defining the ''predictive neighborhood" of a 

node X^ in G (e.g., n (XJ or |3 (Xi) ) . 
Output: A set of materialized (predicted) attributes X^^ (^pred = ^ 

X„^J and a CaRT predictor PRED (Xi) - Xi for each Xi e Xp^ed 

begin 

•= ^' ^pred *= ^ 

2. PRED(Xi) := ^ for all Xi 6 X, improve := true 

3. while (improve ^ false) do 

4. for each Xi 6 X^^t 

5. mater__neighbors (Xi) : = 
(X„3tnneighborhood(Xi) )U{PRED(X) :X 6 neighborhood 
(Xi), X e X,,eJ-{Xi} 

6. M := BuildCaRT (Mater_neighbors (Xi)-*Xi, ei) 

7. let PRED (Xi) c mater_neighbors (Xi) be the set of 
predictor attributes used in M 

8 . cost_changei : =0 

9. for each Xj e X^^^d such that Xi e PRED(Xj) 

10. NEW_PREDi (Xj) := PRED (X^ ) - {XijUPRED (Xi) 

11. M :=BuildCaRT (NEW_PREDi (Xj ) -Xj , e., 

12. set NEW_PREDi (Xj) to the (sub) set of 

predictor attributes used in M 

13. cost_changei := cost_changei + (PredCost (PRED 

(Xj ) -Xj ) - PredCost (NEW_PREDi (Xj ) -Xj ) ) 

14 . end 

15 . end 

16. build an undirected, node-weighted graph G^emp = (X^^f 
Etemp) the current set of materialized 

17. attributes X^t, where: 

18. (a) E.^^p := {(X,Y) : for all pairs X, Y e X^.^djU 

19. { (Xi,Y) : for all Y e X,,,} 

20. (b) weight (XJ := MaterCost (Xi) -PredCost (PRED(Xi) 
- Xi) +cost_changei for each Xi e X^^t 

21. S := FindWMIS (Gtemp) /* select (approximate) maximum 
weight independent set in G^emp 

22. as ''maximum-benefit" subset of 

predicted attributes */ 

23. if (£xGs weight (X) < 0) then improve := false 

24. else/* update X^^^^ ^red/ chosen CaRT predictors */ 

25. for each Xj 6 Xp^^^ 

26. if (PRED(Xj)n S = {Xi}) then PRED (Xj) : = 

NEW_PREDi(Xj) 

27. end 

2 8 . : = Xjnat " S / Xpred ' ~ ^red ^ ^ 

29. end 

30. end /* while */ 



end 



FIGURE 4: The MaxIndependentSet CaRT Selection Algorithm 



procedure LowerBound (N, e^, b) 

Input: Leaf N for which lower bound on subtree cost is to be 
computed; error tolerance e^ for attribute X^; bound b 
on the maximum number of internal nodes in subtree 
rooted at N. 

Output: Lower bound L{N) on cost of subtree rooted at N. 
begin 

1. for i := to r 

2. minOut [i,0] :=i 

3. for J := 1 to b + 1 

4. minOut[0,j] :=0 

5. 1 :=0 

6. for i := 1 to r 

7. while Xi - x^^^ > 2^^ 

8. 1 :=1 = 1 

9. for j := 1 to b + 1 

10. minOut[i,j] := min {minOut[i - l,j] + 1, minOut [l,j-l] 

11. end 

12. L(N) := oo 

13. for J := 0 to b 

14. L(N) :=min {L(N), 2j + 1 + j log (|Xi|)+ (j + 1 + minOut 

(r, j+1) ) log (|dom(Xi) |) } 

15. L(N) := min {l(N), 2b + 3 + (b + 1) log (1x^1)+ (b + 2) log 
(Idom(Xi) 1) } 

16 . return L (N) 
end 



FIGURE 5: Algorithm for Estimating Lower Bound on Subtree Cost 
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